An Analysis of POS Tag Patterns in Ontology Identifiers and Labels
نویسنده
چکیده
I describe an analysis of the syntax of identifier names found in a corpus of over 500 ontologies. The analysis was performed in five steps: (i) extraction of identifier names from the corpus; (ii) construction of dummy sentences containing the identifiers; (iii) part-of-speech (POS) tagging; (iv) extraction of POS tag strings; (v) POS string frequency analysis; and (vi) general syntactic pattern analysis. The findings of the analysis were that identifier names follow simple syntactic patterns; each type of identifier can be expressed through relatively few patterns; and the syntax of identifiers differs from natural English in consistent ways.
منابع مشابه
مدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملمعرفی رویکردی ماشینی با استفاده از الگوریتم لسک و برچسبدهی نحوی جهت رفع ابهام از معنای کلمات
The present study introduces a machine-based approach for word sense disambiguation (WSD). In Persian, a morphologically complex language, POS tag which lots of homographs are made, one way for doing WSD is allocating the right Part Of Speech (POS) tags to words prior to WSD. Since the frequency of noun and adjective homographs in different Persian POS tag text corpuses is high, POS tag disambi...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملA Survey of Identifiers and Labels in OWL Ontologies
We present a survey of the usage and style of identifiers and labels of named entities in a corpus of OWL ontologies. We investigated the frequency of use of both labels and meaningful or meaningless identifiers in those ontologies. We also surveyed common practices of lexical encoding styles for identifiers. We found that most ontologies do not use labels for named entities. When they do use l...
متن کاملHMM Based Chunker for Hindi
This paper presents an HMM-based chunk tagger for Hindi. Various tagging schemes for marking chunk boundaries are discussed along with their results. Contextual information is incorporated into the chunk tags in the form of partof-speech (POS) information. This information is also added to the tokens themselves to achieve better precision. Error analysis is carried out to reduce the number of c...
متن کامل